CSR Data Collection
نویسندگان
چکیده
The CSR Development and Evaluation Spokes data collection task yielded 4435 development test utterances fr~n 30 speakers and 4878 evaluation test utterances from a different set of 30 speakers. The development test data covered eight different spoke conditions, each of which had its own distinct combination of subject, prompt text, microphone and recording environment requirements. Similarly, the evaluation test data covered nine different spoke conditicm and two hub conditions, each of which had its own unique requirements.
منابع مشابه
CSR Data Collection Pilot
The objective of the CSR Corpus Development is to collect and deliver a large corpus of continuous speech data to support DARPA research efforts in continuous speech recognition (CSR). The CSR corpus is intended to be task independent and to consist of speech that is similar to that which would be expected from eventual users of real world CSR systems. Toward these ends, the current pilot colle...
متن کاملSpontaneous Speech Collection for the CSR Corpus
As part of a pilot data collection for DARPA's Continuous Speech Recognition (CSR) speech corpus, SRI International experimented with the collection of spontaneous speeoh material. The bulk of the CSR pilot data was read versions of news articles from the Wall Street Journal (WSJ), and the spontaneous sentences were to be similar material, but spontaneously dictated. In the first pilot portion ...
متن کاملNIST-DARPA Interagency Agreement: Spoken Language Program
1. To coordinate the design, development and distribution of speech and natural language corpora for the DARPA Spoken Language research community. 2. To design, coordinate implementation, and analyze results, of performance assessment "benchmark tests" for DARPA's speech recognition and spoken language understanding systems. 1. Completed production of the six-CD-ROM-set for ATIS0, and made this...
متن کاملOn the Mapping of Index Compression Techniques on CSR Information Retrieval
Information retrieval is the selection of documents relevant to a query. Inverted index is the conventional way to store the index of the collection. Because of the large amounts of data, compression techniques are commonly used in information retrieval systems to reduce the size of the inverted index. We experimentally evaluate the result of the mapping of such techniques on the Compressed Spa...
متن کاملCSR Corpus Collection
Subject Efficiency SRI's first goal was to speed up subject interaction with the data collection software. Additional memory was added to the data collection systems, and data collection software made much faster, so that now the pace of the data collection process is directly controlled by the subject and no longer limited by the software. As a result, the average data collection pace has incr...
متن کامل